AITopics | question pair

Collaborating Authors

question pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

78f9c04bdcb06f1ada3902912d8b64ba-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-28-2026, 16:55:10 GMT

2: ... - Annotation: contains 83,591 annotated question pairs.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

78f9c04bdcb06f1ada3902912d8b64ba-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 01:53:31 GMT

The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Measuring IIA Violations in Similarity Choices with Bayesian Models

Corrêa, Hugo Sales, Sankagiri, Suryanarayana, Figueiredo, Daniel Ratton, Grossglauser, Matthias

arXiv.org Machine LearningAug-21-2025

Similarity choice data occur when humans make choices among alternatives based on their similarity to a target, e.g., in the context of information retrieval and in embedding learning settings. Classical metric-based models of similarity choice assume independence of irrelevant alternatives (IIA), a property that allows for a simpler formulation. While IIA violations have been detected in many discrete choice settings, the similarity choice setting has received scant attention. This is because the target-dependent nature of the choice complicates IIA testing. We propose two statistical methods to test for IIA: a classical goodness-of-fit test and a Bayesian counterpart based on the framework of Posterior Predictive Checks (PPC). This Bayesian approach, our main technical contribution, quantifies the degree of IIA violation beyond its mere significance. We curate two datasets: one with choice sets designed to elicit IIA violations, and another with randomly generated choice sets from the same item universe. Our tests confirmed significant IIA violations on both datasets, and notably, we find a comparable degree of violation between them. Further, we devise a new PPC test for population homogeneity. Results show that the population is indeed homogenous, suggesting that the IIA violations are driven by context effects -- specifically, interactions within the choice sets. These results highlight the need for new similarity choice models that account for such context effects.

artificial intelligence, machine learning, participant, (15 more...)

arXiv.org Machine Learning

2508.14615

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.48)
Research Report > Experimental Study (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models

Xu, Zixiang, Wang, Yanbo, Huang, Yue, Chen, Xiuying, Zhao, Jieyu, Jiang, Meng, Zhang, Xiangliang

arXiv.org Artificial IntelligenceMay-27-2025

Large Language Models (LLMs) have achieved remarkable success in Natural Language Processing (NLP), yet their cross-lingual performance consistency remains a significant challenge. This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in LLMs. Our approach leverages beam search and LLM-based simulation to generate bilingual question pairs that expose performance discrepancies between English and target languages. We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models. The extensive experiments demonstrate that our method precisely and cost-effectively pinpoints cross-lingual weaknesses, consistently revealing over 50\% accuracy drops in target languages across a wide range of models. Moreover, further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns and benefit from targeted post-training. Code is available at https://github.com/xzx34/Cross-Lingual-Pitfalls.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.18673

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Detecting Redundant Health Survey Questions Using Language-agnostic BERT Sentence Embedding (LaBSE)

Kang, Sunghoon, Kim, Hyeoneui, Park, Hyewon, Taira, Ricky

arXiv.org Artificial IntelligenceDec-4-2024

The goal of this work was to compute the semantic similarity among publicly available health survey questions in order to facilitate the standardization of survey-based Person-Generated Health Data (PGHD). We compiled various health survey questions authored in both English and Korean from the NIH CDE Repository, PROMIS, Korean public health agencies, and academic publications. Questions were drawn from various health lifelog domains. A randomized question pairing scheme was used to generate a Semantic Text Similarity (STS) dataset consisting of 1758 question pairs. Similarity scores between each question pair were assigned by two human experts. The tagged dataset was then used to build three classifiers featuring: Bag-of-Words, SBERT with BERT-based embeddings, and SBRET with LaBSE embeddings. The algorithms were evaluated using traditional contingency statistics. Among the three algorithms, SBERT-LaBSE demonstrated the highest performance in assessing question similarity across both languages, achieving an Area Under the Receiver Operating Characteristic (ROC) and Precision-Recall Curves of over 0.99. Additionally, it proved effective in identifying cross-lingual semantic similarities.The SBERT-LaBSE algorithm excelled at aligning semantically equivalent sentences across both languages but encountered challenges in capturing subtle nuances and maintaining computational efficiency. Future research should focus on testing with larger multilingual datasets and on calibrating and normalizing scores across the health lifelog domains to improve consistency. This study introduces the SBERT-LaBSE algorithm for calculating semantic similarity across two languages, showing it outperforms BERT-based models and the Bag of Words approach, highlighting its potential to improve semantic interoperability of survey-based PGHD across language barriers.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.03817

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > South Korea > Seoul > Seoul (0.05)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Racing Thoughts: Explaining Large Language Model Contextualization Errors

Lepori, Michael A., Mozer, Michael, Ghandeharioun, Asma

arXiv.org Artificial IntelligenceOct-2-2024

The profound success of transformer-based language models can largely be attributed to their ability to integrate relevant contextual information from an input sequence in order to generate a response or complete a task. However, we know very little about the algorithms that a model employs to implement this capability, nor do we understand their failure modes. For example, given the prompt "John is going fishing, so he walks over to the bank. Can he make an ATM transaction?", a model may incorrectly respond "Yes" if it has not properly contextualized "bank" as a geographical feature, rather than a financial institution. We propose the LLM Race Conditions Hypothesis as an explanation of contextualization errors of this form. This hypothesis identifies dependencies between tokens (e.g., "bank" must be properly contextualized before the final token, "?", integrates information from "bank"), and claims that contextualization errors are a result of violating these dependencies. Using a variety of techniques from mechanistic intepretability, we provide correlational and causal evidence in support of the hypothesis, and suggest inference-time interventions to address it.

contextualization, representation, subject entity, (12 more...)

arXiv.org Artificial Intelligence

2410.02102

Country:

Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DORIS-MAE: Scientific Document Retrieval using Multi-level Aspect-based Queries

Wang, Jianyou, Wang, Kaicheng, Wang, Xiaoyue, Naidu, Prudhviraj, Bergen, Leon, Paturi, Ramamohan

arXiv.org Artificial IntelligenceOct-28-2023

In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high cost and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research. We developed a benchmark dataset within the field of computer science, consisting of 100 human-authored complex query cases. For each complex query, we assembled a collection of 100 relevant documents and produced annotated relevance scores for ranking them. Recognizing the significant labor of expert annotation, we also introduce Anno-GPT, a scalable framework for validating the performance of Large Language Models (LLMs) on expert-level dataset annotation tasks. LLM annotation of the DORIS-MAE dataset resulted in a 500x reduction in cost, without compromising quality. Furthermore, due to the multi-tiered structure of these complex queries, the DORIS-MAE dataset can be extended to over 4,000 sub-query test cases without requiring additional annotation. We evaluated 17 recent retrieval methods on DORIS-MAE, observing notable performance drops compared to traditional datasets. This highlights the need for better approaches to handle complex, multifaceted queries in scientific research. Our dataset and codebase are available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset.

dataset, query, requirement, (14 more...)

arXiv.org Artificial Intelligence

2310.04678

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval

Vishwanathan, Asha, Warrier, Rajeev Unnikrishnan, Suresh, Gautham Vadakkekara, Kandpal, Chandra Shekhar

arXiv.org Artificial IntelligenceJan-25-2023

Business-specific Frequently Asked Questions (FAQ) retrieval in task-oriented dialog systems poses unique challenges vis-\`a-vis community based FAQs. Each FAQ question represents an intent which is usually an umbrella term for many related user queries. We evaluate performance for such Business FAQs both with standard FAQ retrieval techniques using query-Question (q-Q) similarity and few-shot intent detection techniques. Implementing a real world solution for FAQ retrieval in order to support multiple tenants (FAQ sets) entails optimizing speed, accuracy and cost. We propose a novel approach to scale multi-tenant FAQ applications in real-world context by contrastive fine-tuning of the last layer in sentence Bi-Encoders along with tenant-specific weight switching.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.10517

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.05)
Asia > Singapore (0.04)
(2 more...)

Genre: Frequently Asked Questions (FAQ) (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

QBERT: Generalist Model for Processing Questions

Xu, Zhaozhen, Cristianini, Nello

arXiv.org Artificial IntelligenceDec-4-2022

Using a single model across various tasks is beneficial for training and applying deep neural sequence models. We address the problem of developing generalist representations of text that can be used to perform a range of different tasks rather than being specialised to a single application. We focus on processing short questions and developing an embedding for these questions that is useful on a diverse set of problems, such as question topic classification, equivalent question recognition, and question answering. This paper introduces QBERT, a generalist model for processing questions. With QBERT, we demonstrate how we can train a multi-task network that performs all question-related tasks and has achieved similar performance compared to its corresponding single-task models.

classification, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2212.01967

Country:

Europe > United Kingdom (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

What makes us curious? analysis of a corpus of open-domain questions

Xu, Zhaozhen, Howarth, Amelia, Briggs, Nicole, Cristianini, Nello

arXiv.org Artificial IntelligenceOct-28-2021

Every day people ask short questions through smart devices or online forums to seek answers to all kinds of queries. With the increasing number of questions collected it becomes difficult to provide answers to each of them, which is one of the reasons behind the growing interest in automated question answering. Some questions are similar to existing ones that have already been answered, while others could be answered by an external knowledge source such as Wikipedia. An important question is what can be revealed by analysing a large set of questions. In 2017, "We the Curious" science centre in Bristol started a project to capture the curiosity of Bristolians: the project collected more than 10,000 questions on various topics. As no rules were given during collection, the questions are truly open-domain, and ranged across a variety of topics. One important aim for the science centre was to understand what concerns its visitors had beyond science, particularly on societal and cultural issues. We addressed this question by developing an Artificial Intelligence tool that can be used to perform various processing tasks: detection of equivalence between questions; detection of topic and type; and answering of the question. As we focused on the creation of a "generalist" tool, we trained it with labelled data from different datasets. We called the resulting model QBERT. This paper describes what information we extracted from the automated analysis of the WTC corpus of open-domain questions.

arxiv preprint arxiv, corpus, dataset, (11 more...)

arXiv.org Artificial Intelligence

2110.15409

Country:

Europe > United Kingdom > England > Bristol (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)

Add feedback